Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ceilf16 and ceilf128 #436

Merged
merged 4 commits into from
Jan 22, 2025
Merged

Add ceilf16 and ceilf128 #436

merged 4 commits into from
Jan 22, 2025

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented Jan 13, 2025

Add a generic version of ceil. Additionally, make use of this version to implement ceil and ceilf.

Musl's ceilf algorithm seems to work better for all versions of the functions. Testing with a generic version of musl's ceil routine showed the following regressions:

icount::icount_bench_ceil_group::icount_bench_ceil logspace:setup_ceil()
Performance has regressed: Instructions (14064 > 13171) regressed by +6.78005% (>+5.00000)
  Baselines:                      softfloat|softfloat
  Instructions:                       14064|13171                (+6.78005%) [+1.06780x]
  L1 Hits:                            16697|15803                (+5.65715%) [+1.05657x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               7|8                    (-12.5000%) [-1.14286x]
  Total read+write:                   16704|15811                (+5.64797%) [+1.05648x]
  Estimated Cycles:                   16942|16083                (+5.34104%) [+1.05341x]
icount::icount_bench_ceilf_group::icount_bench_ceilf logspace:setup_ceilf()
Performance has regressed: Instructions (14732 > 9901) regressed by +48.7931% (>+5.00000)
  Baselines:                      softfloat|softfloat
  Instructions:                       14732|9901                 (+48.7931%) [+1.48793x]
  L1 Hits:                            17494|12611                (+38.7202%) [+1.38720x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               6|6                    (No change)
  Total read+write:                   17500|12617                (+38.7018%) [+1.38702x]
  Estimated Cycles:                   17704|12821                (+38.0860%) [+1.38086x]

Add ceilf16 and ceilf128

Use the generic algorithms to provide implementations for these routines.

@tgross35 tgross35 force-pushed the generic-ceil branch 3 times, most recently from a809ed4 to 21c19b5 Compare January 13, 2025 12:15
@tgross35
Copy link
Contributor Author

i386 should probably get an asm implementation for floorf/ceilf as well, similar to floor/ceil

@tgross35 tgross35 force-pushed the generic-ceil branch 3 times, most recently from f56c40f to 5cfb028 Compare January 22, 2025 07:08
This may allow for small optimizations with larger float types since
`u32` math can be used after shifting. LLVM may be already getting this
anyway.
@tgross35 tgross35 force-pushed the generic-ceil branch 2 times, most recently from b25aa4b to 94581b5 Compare January 22, 2025 07:12
`exp` does not perform any form of unbiasing, so there isn't any reason
it should be signed. Change this.

Additionally, add `EPSILON` to the `Float` trait.
@tgross35 tgross35 force-pushed the generic-ceil branch 2 times, most recently from 060274d to 2938a7a Compare January 22, 2025 07:20
Additionally, make use of this version to implement `ceil` and `ceilf`.

Musl's `ceilf` algorithm seems to work better for all versions of the
functions. Testing with a generic version of musl's `ceil` routine
showed the following regressions:

    icount::icount_bench_ceil_group::icount_bench_ceil logspace:setup_ceil()
    Performance has regressed: Instructions (14064 > 13171) regressed by +6.78005% (>+5.00000)
      Baselines:                      softfloat|softfloat
      Instructions:                       14064|13171                (+6.78005%) [+1.06780x]
      L1 Hits:                            16697|15803                (+5.65715%) [+1.05657x]
      L2 Hits:                                0|0                    (No change)
      RAM Hits:                               7|8                    (-12.5000%) [-1.14286x]
      Total read+write:                   16704|15811                (+5.64797%) [+1.05648x]
      Estimated Cycles:                   16942|16083                (+5.34104%) [+1.05341x]
    icount::icount_bench_ceilf_group::icount_bench_ceilf logspace:setup_ceilf()
    Performance has regressed: Instructions (14732 > 9901) regressed by +48.7931% (>+5.00000)
      Baselines:                      softfloat|softfloat
      Instructions:                       14732|9901                 (+48.7931%) [+1.48793x]
      L1 Hits:                            17494|12611                (+38.7202%) [+1.38720x]
      L2 Hits:                                0|0                    (No change)
      RAM Hits:                               6|6                    (No change)
      Total read+write:                   17500|12617                (+38.7018%) [+1.38702x]
      Estimated Cycles:                   17704|12821                (+38.0860%) [+1.38086x]
Use the generic algorithms to provide implementations for these
routines.
@tgross35
Copy link
Contributor Author

tgross35 commented Jan 22, 2025

Current results, this algorithm shows an improvement:

icount::icount_bench_ceil_group::icount_bench_ceil logspace:setup_ceil()
  Baselines:                      softfloat|softfloat
  Instructions:                        9284|13171                (-29.5118%) [-1.41868x]
  L1 Hits:                            11901|15802                (-24.6867%) [-1.32779x]
  L2 Hits:                                0|1                    (-100.000%) [---inf---]
  RAM Hits:                               8|8                    (No change)
  Total read+write:                   11909|15811                (-24.6790%) [-1.32765x]
  Estimated Cycles:                   12181|16087                (-24.2805%) [-1.32066x]
icount::icount_bench_ceilf128_group::icount_bench_ceilf128 logspace:setup_ceilf128()
  Baselines:                      softfloat|softfloat
  Instructions:                       45271|N/A                  (*********)
  L1 Hits:                            62793|N/A                  (*********)
  L2 Hits:                                0|N/A                  (*********)
  RAM Hits:                              25|N/A                  (*********)
  Total read+write:                   62818|N/A                  (*********)
  Estimated Cycles:                   63668|N/A                  (*********)
icount::icount_bench_ceilf16_group::icount_bench_ceilf16 logspace:setup_ceilf16()
  Baselines:                      softfloat|softfloat
  Instructions:                       34267|N/A                  (*********)
  L1 Hits:                            43963|N/A                  (*********)
  L2 Hits:                                1|N/A                  (*********)
  RAM Hits:                              11|N/A                  (*********)
  Total read+write:                   43975|N/A                  (*********)
  Estimated Cycles:                   44353|N/A                  (*********)
icount::icount_bench_ceilf_group::icount_bench_ceilf logspace:setup_ceilf()
  Baselines:                      softfloat|softfloat
  Instructions:                        9855|9901                 (-0.46460%) [-1.00467x]
  L1 Hits:                            12566|12612                (-0.36473%) [-1.00366x]
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                   12571|12617                (-0.36459%) [-1.00366x]
  Estimated Cycles:                   12741|12787                (-0.35974%) [-1.00361x]

@tgross35 tgross35 enabled auto-merge January 22, 2025 07:26
@tgross35 tgross35 merged commit 812e15a into rust-lang:master Jan 22, 2025
35 checks passed
@tgross35 tgross35 deleted the generic-ceil branch January 22, 2025 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant